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Abstract The recently discovered globin-coupled sensors 
(GCSs) are heme-containing two-domain transducers distinct 
from the PAS domain superfamily. We have identified an addi- 
tional 22 GCSs with varying multi-domain C-terminal transmit- 
ters through a search of the complete and incomplete microbial 
genome datasets. The GCS superfamily is composed of two 
_maior_subfamiIies:_the_aei!Otactic and gene, regulators. We pos- 
tulate the existence of protoglobin in Archaea as the predecessor 
to the chimeric GCS. 
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cGMP 2nd messenger. The direct oxygen sensor (Dos), first 
described in Escherichia coli [8], functions as a tetrameric 
phosphodiesterase (PDE) by converting cAMP to 5'-AMP 
while in the ferrous form, and is strongly inhibited by CO 
and NO ligands [9], Al from Acetobacter xylinum (AxP- 
DEA1) also functions as a PDE by linearizing cyclic 
_bisP^-5-)diguanylate,-an^llosteric-activator-of-the-bacterial 
cellulose synthase, to the ineffectual pGpG [10,11]. Both Dos 
and .4xPDEAl possess similar heme-binding PAS domains 
fused to the PDE C-terminus, consisting of a GGDEF and 
EAL domairi^ltlll^ 
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1. Introduction 



Homo- and heteromeric heme-based sensors are mediators- * 
of cellular responses to metabolic and environmental stimuli 
such as NO, CO and Cb [1]. Ghtn^pilmlm^ 
cbncentratiom3 v areisensedc ^bma^hemes moieteand **residt^in, 

>six 



eitheriaerv,-, 

tt o^- aresthe^only ^aerotactic heme sensors combming^gjobm 

femT^e1^re^a%n, either by binding DNA directly, modulat- 
ing a small metabolite 2nd messenger (cyclic mono- and di- 
NMPs), or directly interacting with a transcription factor or 
regulator. 

C00A is a CO sensor that controls the transcription of CO- 
utilizing genes. Binding of CO to the heme domain of C00A 
homodimers modulates the DNA-binding C-terminal domain 
[2]. Neuronal PAS domain protein 2 (NPAS2) is expressed in 
mammalian brain tissue [3] and regulates transcription as a 
heterodimer with BMAL1 [4-^]. Dissociation of the 
NPAS2:BMAL1 heterodimer occurs upon CO binding to 
the NPAS2 monomer, effectively removing its DNA-binding, 
and hence, transcription capability [3]. The soluble guanylate 
cyclase (sGQ contains a heme-binding and guanylate cyclase 
domain. Binding of NO to the sGC heterodimer produces 
cGMP from GTP [7], whereby gene regulation ensues by the 



.gfgilie^^ 
-yfete^Flx^ 

<HernI^rja^ 
#helnerbma^ 

laPimi^ [17]. Hem- 

— ATsroriginally discovered in the archaeon Halobacterium sa- 
linarum and the Firmicutes Bacillus subtilis, are members of 
the family of globin-coupled sensors (GCSs) [18,19]. Variance 
in the C-terminal transmitter domain indicates that not all 
GCSs are involved in aero taxis. In this report, we further 
identify the diversity of these GCSs resulting from exhaustive 
searches of completed and in-progress microbial genomes. We 
also report their putative functions and categorize them in 
relation to other non-globin heme-based sensors and propose 
two possible evolutionary models of the GCS and globin. 

2. Materials and methods 

2.7. Genome and protein sequences 

The following preliminary sequence data was obtained from 
the Institute for Genomic Research website: Acidithiobacillus 
ferrooxidans t Bacillus anthracis, Bacillus cereus t Carboxydo- 
thermus hydrogenoformans, and Geobacter sulfurreducens; 
DOE Joint Genome Institute: Azotobacter vinelandii, Burkhol- 
deria fungorum, Geobacter metallireducens, Magnetococcus, 
Magnetospirillum magnetotacticum, Rhodobacter sphaeroides, 
Rhodospirillum rubrum 9 and Novosphingobium aromaticivo- 
rans; National Center for Biotechnology Information : Esche- 
richia coli 0157 H7, Halobacterium salinarum t Agrobacterium 
turnefaciens, Caulobacter crescentus, Bacillus halodurans, Ba- 
cillus subtilis, Vibrio vulnificus, and Shigella flexneri; the Bor- 
detella pertussis, Bordetella parapertussis, and Bordetella bron- 
chiseptica sequence data was produced by the Bordetella 
pertussis Sequencing Group at the Sanger Institute and can 
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be obtained from ftp://ftp.sanger.ac.uk/pub/pathogens/bp/. At 
present, five genomes are incompletely sequenced and there- 
fore accession numbers are not available for those proteins 
(see Table 1 for details). 

2.2. Multiple alignments and secondary structure 

All sequences were aligned in a two-stage process. Multiple 
alignments in ClustalX vl.8 [20] were followed by manual 
adjustment in DNAStar's MegAlign. At this stage, globin 
crystal structures (£. coli HMP, PDB ID: 1GVH; Vitreoscilla 
stercoraria Hb, PDB ID: 1VHB; Ralstonia eutropha FHb, 



PDB ID: 1CQX; Chlamydomonas eugametos trHb, PDB 
ID: IDLY; Paramecium caudatum trHb, PDB ID: 1DLW; 
HemAT-fls, PDB ID: 10R6) and Jnet [21] secondary struc- 
ture predictions were used as guides to produce the finished 
alignments in Fig. 1A. 

23. Protein domain detection and analyses 

Protein sequences were analyzed with the Pfarn (http:// 
pfam.wustLedu/), SMART (http://smart.embl-heidelberg.de/), 
and SCOP (http://scop.berkeley.edu/) datasets and domain 
descriptions were taken from the InterPro database (http:// 
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Fig. 1. Diversity of GCSs. The structural alignment (A) and the phylogenetic tree (B) of the GCS globin domain. A: The structural alignment 
of the globin domains from 27 GCSs was created in ClustalX and MegAlign and includes the 2D structure of the recent HemAT-fij crystal 
structure (PDB ID: 10R6) (personal communication) as a reference/The traditional helical assignments are maintained as helices A through 
H, with an additional Z helix at the N-terminus. The asterisk (*) indicates the conserved proximal histidine. Amino acid conservation has been 
based on an 85% consensus sequence and colors are assigned to amino acid groups as follows: charged (c, DEHKR) in white on blue back- 
ground; polar (p, KRHEDQNST) in red; turn-like (t, ACDEGHKNPQRST) in green; bulky hydrophobic (h, ACLIVMHYFW) and aliphatic 
(1, LIVM) in yellow; aromatic (a, FHWY) in white on pink background; small (s, ACDGNPSTV) in purple; and tiny (u, AGS) in white on 
purple background. B: The phylogenetic tree is based on the alignment of part A of this figure with branches grouping according to transmit- 
ter type. Branches supported with bootstrap values > 5000 are indicated. Taxonomic listings for the GCS-containing organisms are listed with 
the organisms' names colored according to the type of transmitter domain. Pink, GAF:EAL; orange, unclassified; blue, ERERQR :GGDEF; 
purple, GGDEF:EAL; green, STAS; red, MCP or HAMP:MCP. 
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www.ebi.ac.uk/interpro). Various BLAST and PSI-BLAST 
searches were performed against the non-redundant database 
and the microbial database at the National Center for Bio- 
technology Information (http://www.ncbi.nih.gov/BLAST/). 
Transmembrane regions were identified by the algorithms 
TMHMM2 and DAS (http://www.cbs.dtu.dk/services/ 
TMHMM-2.0/ and http://www.sbc.su.se/~miklos/DAS/). 

2.4. Phylogenetic analyses 

The distance tree was created using the neighbor-joining 
(ClustalX) method. Bootstraps (10000 replicates) were calcu- 
lated directly in ClustalX. Trees were generated in TreeView 
and NJPlot (distributed with the ClustalX package) and fur- 
ther refined in Adobe Illustrator 10. 

3. Results and discussion 

An exhaustive heuristic search of the non-redundant pro- 
tein database and (un)finished microbial genome database at 
NCBI yielded 27 GCSs. Criteria for identifying a putative 
GCS included a primary match with the globin domain fol- 
4e w e d by an accompanying tr ansm itter d o ma ia(s)r-In-addi- 
tion, the length of the globin domain was taken into account 
as well as the presence of a proximal histidine. In almost all 
cases, a hydrophobic aromatic residue pair at the end of the B 
helix (usually Phe-Tyr) was also present. Secondary structure 
predicting algorithms and the 3D-PSSM fold-recognition ser- 
ver were used to support their inclusion into the family. Using 
(PSI)BLAST as the primary search algorithm, once a GCS 
was identified, it was added to the seed alignment. Since the 
GCS globin domains are highly divergent, each GCS sequence 




Pig." 1" {Continued). 



added to the growing alignment used as a (PSI)BLAST probe 
for additional candidates. 

Neither the SMART database nor the manually curated 
Pfam-A dataset recognizes the GCS globin domain yet, 
though the automatically generated Pfam-B family 7730 has 
an incomplete and partially incorrect (on the basis of the 
above criteria) GCS globin domain dataset. Fig. 1 A represents 
the alignment of the globin domain of all 27 GCSs. The re- 
sulting Neighbor-joining phylogenetic tree was created based 
on this alignment and is presented in Fig. IB. 

3.1. Biological heme-sensor classification 

Using the identified functions of CooA, NPAS2, sGC, Dos, 
^xPDEAl, FixL, and HemATs, all currently identified bio- 
logical heme-based sensors can be classified as either aerotac- 
tic or gene regulating. Gene regulation is observed to occur 
via one of three different pathways: via protein-DNA inter- 
action [2-6], via modulation of small-metabolite 2nd messen- 
gers [7— 12], or by protein-protein interaction as in a transcrip- 
tion factor or regulator [13-16]. The resulting organization 
schema is illustrated in Fig. 2. GCSs are found in organisms 

with^arious^hysiolQgjGal-aad-meta bolic s ys tems :-Gram-pos- 

itive and Gram-negative, aerobic and anaerobic, oxic and an- 
oxic phototrophs, and even a nitrogen fixer (A. vinelandii). 

3.1.1. Aerotactic. HemATs are the only known heme- 
based aerotaxis sensors [17,18] and approximately half of 
the predicted GCSs are HemATs. Each possess an N-terminal 
globin domain and a C-terminal MCP-like domain. The orig- 
inal HemAT signaling domain was classified as an ~MCP 
[17]; however, additional HemATs exhibit a ~HAMP:MCP 
module. Such a combination is typical of transmitter regions 
of methyl-accepting chemotaxis proteins such as the K coli 
serine receptor, Tsr, and hence these proteins may mediate 

aerotaxis as' well"All HemATs~are~soluble proteins. 

The aerotactic subfamily is predominantly Gram-negative 
a-Proteobacteria (nine proteins), but also includes the Firmi- 
cutes (five proteins) and one Archaea. In particular, the mag- 
netotactic proteobacterium M. magnetotacticum possesses two 
aerotactic transducers, whereas Magnetococcus MC-1 cells 
possess only one. Magnetotaxis has been shown to work in 
conjunction with aerotaxis [22]. Though only a single Archae- 
al transducer has been found, this is not surprising since at 
least half of the sequenced Archaeal genomes do not contain 
recognizable taxis genes. Moreover, the representative sample 
size of the Archaeal genomes (one GCS out of 18 genomes 
~6%) is miniscule compared to that of the bacterial genomes 
(26 GCSs out of 228 genomes ~ 1 1%). 

3.12. Modulation of a 2nd messenger. Proteins possessing 
the GGDEF domain have been implicated in c-diGMP mod- 
ulation [23] and eight such proteins were identified in this 
group, incorporating either the GGDEF domain or a GGDE- 
F:EAL domain pair. Closer inspection of these proteins re- 
veals another highly conserved domain centered between the 
N-terminal globin sensor and the C-terminal GGDEF do- 
. main._This_nw_dpmainJias been designated as ERERQR, 
after a conserved patch of residues ( s 85% of five acidic, 
seven basic, 32 polar and 25 hydrophobic sites in a primarily 

alpha— helical and c oiled— st ructure , data— not— shown). -Af- 

GReg2M has the exact C-terminal domain organization as 
EcDos and AxPDEM (~ GGDEF :EAL), PDEs that inacti- 
vate the 2nd messengers cAMP and c-diGMP, respectively. 
— ' The GCS-from-Brfungorum (fiyGReg) possesses a C-terminal 
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Fig. 2. Functional classification scheme of biological heme-based sensors. Heme-based sensors Coo A, NPAS2, sGC, j4xPDEA1, Dos, FixL, 
and HemAT can be grouped according to their primary functions described in the literature. The GCSs are tentatively categorized according to 
this schema. BfGReg is believed to be a gene regulator of either the 2nd messenger or transcription regulator class. No function could be as- 
signed to the two membrane-bound Geobacter GCSs, GsGCS and GrnGCS._Domains_with„an.asterisks_(*) indicate, new domains not presently 
a part of the SMART database. See text and Table 1 for details. 



GAF:EAL together with an additional PAS domain. Proteins 
possessing the GAF domain regulate small molecules like 
cAMP and cGMP and function in transcription [23-25]. 

3.1.3. Protein-protein interactions. KvGReg from V. vulni- 
ficus possesses a C-terminal STAS (sulfate transporter and 
anti-c factor antagonist) domain recognized by Pfam as an 
anti-anti-o factor. Spore formation in B. subtilis is an example 
of such a regulated process utilizing oF (a factor initiating 
prespore development), its antagonist SpoIIAB, and the anti- 
anti-o factor, SpoIIAA. To our knowledge, KvGReg is the 
first example of a globin domain with a transcriptional regu- 
lator. GCSs predicted to be involved in DNA binding have 
yet to be identified. 

3.1.4. Unclassified GCS. Two GCSs identified in the strict 
anaerobic 8-Proteobacteria may be involved in sulfate/sulfur 
reduction. GsGCS from G. sulfurreducens and GrnGCS from 
G. metallireducens exhibit a bundle of four transmembrane 
helices at C-terminal resemble either glutathione 5-transferase 

(GST)_pr ferriti n-like proteins . Th ese are generally soluble. 

proteins; however, a distinct nucrosomal membrane-bound 
GST family has been identified [26,27]. Both proteins are in- 

-^Ived-ins^llular-protection-from-toxidty-of-TCactive-oxygen 

species [28]. 

3.2. Phylogenetics of the GCSs 
The phylogenetic tree (Fig~lB) results in two interpreta-- ~ 



tions: (1) there is a predisposition of bacterial lineages for 
particular signal-transducing elements, or (2) the globin do- 
mains are customized to function in concert with particular 
signal-transducing elements. 

In the case of the GCS, a more evolved and ordered protein 
is built up from the less ordered components; namely, the 
ancestor globin, or /wfoglobin, and the signaling domains. 
This higher ordered protein imparts a new function(s) to the 
host organism that allow descendants to thrive in environ- 
ments that may not have been able to survive before. Rapid 
response to toxic oxygen or other highly reactive species that 
otherwise might quickly kill a microbe is a significant pressure 
to retain such a fusion protein. Within the tenet of the bio- 
logical evolution, as atmospheric oxygen levels rose and eu- 
karyotic cells evolved, the need for oxygen taxis may have 
diminished, resulting in the absence of such chimeric systems 
in the upper eukaryotes. There are three organisms that pos- 
sess two GCSs: C crescentus, M. magnetotacticum, and Mag- 
netoc occus. All four p rotein s i n C. crescentus and M. m azneto- 
tacticum are HemATs and therefore it seems likely that they 
arose from gene duplication, i.e. they are paralogs. In con- 
trast, th e two GCSs from Magnctococcus perform d ifferent 
functions. One is a HemAT and the other, a predicted gene 
regulator. This indicates that each globin evolved indepen- 
dently with its particular signaling domain to reflect the ob- 
~sei^ed-diversity-(Fig— lB)"and-predicts-the-existence-of"the 
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protoglobin in more primitive organisms like the Archaea or 
the deeply branching photosynthetic bacteria. 

4. Summary 

The diversity of heme-based sensors in prokaryotes is pre- 
dominantly globin based. The family of GCSs can be grouped 
into two subfamilies, the aerotactic and the gene regulating. 
Though approximately half of the GCSs fall into the gene- 
regulating subfamily, the HemATs are the only known heme- 
based sensors involved in aerotaxis. The GCSs form a family 
of proteins (Fig. 2) that, thus far, populate all but the direct 
DNA-binding sensors. Considering the diversity of the GCSs 
and that the flavohemoglobins are similar to the GCSs, we 
propose that this form of globin was particularly suited for 
forming multi-domain chimeric proteins with novel functions. 
We postulate that protoglobin was the predecessor to the 
chimeric GCS and should therefore be found in more ancient 
organisms, like the Archaea. 
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